feat: Add Inductor backend configs#688
Merged
Merged
Conversation
## Overview This PR introduces a flexible configuration system for PyTorch Inductor backend with 8 predefined config templates, CUDA Graphs compatibility fix, and comprehensive unit tests (28 tests total). ## Changes - Inductor backend with 8 config templates (triton, cpp_wrapper, cutlass, aten, cudagraphs, max_autotune, freezing, tma) - CUDA Graphs output buffer overwrite fix in test_compiler.py - 28 unit tests in test/inductor_backend_test.py ## Testing - All config keys verified against PyTorch 2.7.1 source code - All templates tested with actual model compilation - Unit tests pass: 28/28 OK - TMA config gracefully falls back on non-TMA GPUs (A100) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
…integration - Rename to for clarity - Remove redundant and templates (merge into ) - Add mutual exclusion check: exactly one of template/mode/options can be specified - Remove global config modification; use torch.compile's parameter directly - Clean up mapping (template no longer affects mode) - Update tests to reflect template exclusivity and parameter changes - Simplify __call__ with inline conditional kwargs expansion Templates now exclusively control torch._inductor.config options, while is passed directly to torch.compile without interference.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces configuration for PyTorch Inductor backend,
allowing users to select predefined config templates that set groups of
torch._inductor.configoverrides. This provides an extension to PyTorch'sofficial "mode" concept while maintaining full compatibility with existing
test_compiler.pyframework.Motivation
Previously,
InductorBackendaccepted only basic config parameters throughindividual
inductor_configdictionary entries. Users could not easily enablecommon combinations of Inductor options such as:
This PR addresses these limitations by introducing config templates - pre-defined,
well-tested combinations of
torch._inductor.configoptions that users can selectby name. Templates, mode, and options are mutually exclusive, providing clear
separation of concerns.
Changes Summary
1. Inductor Backend Configuration Templates
File:
graph_net_bench/torch/backend/inductor_backend.pyFeatures
_INDUCTOR_CONFIG_TEMPLATESdictionary with 6 predefined templatesgraph_net_inductor_config_template- select template by nameoptionsparameter directlySupported Templates
tritoncpp_wrapper: Falsecpp_wrappercpp_wrapper: Truecudagraphstriton.cudagraphs: Truemax_autotunefreezingfreezing: Truetmatriton.enable_persistent_tma_matmul: TrueNote that the TMA template works universally across GPU architectures:
No runtime error occurs on GPUs without TMA support.
2. CUDA Graphs Compatibility Fix
File:
graph_net_bench/torch/test_compiler.pyWhen CUDA Graphs is enabled, output tensor pointers are recorded to CUDA Graph buffers.
Subsequent model calls overwrite these buffers, causing errors when accessing compiled
output after eager run. Fixed by cloning outputs immediately:
Note:
eval_backend_perf.pyandeval_backend_diff.pyare unaffected(torch.save/torch.load creates independent copies).
3. Test
File:
test/inductor_backend_test.py(new file, ~290 lines)Test Coverage:
Usage
Documentation References
All configuration keys verified against PyTorch source code: